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Abstract 

This paper studies fundamental questions in computational learning theory from a quantum 
computation perspective. We consider quantum versions of two well-studied classical learning 
models: Angluin's model of exact learning from membership queries and Valiant's Probably 
Approximately Correct (PAC) model of learning from random examples. We give positive and 
negative results for quantum versus classical learnability. For each of the two learning models 
described above, we show that any concept class is information-theoretically learnable from 
polynomially many quantum examples if and only if it is information-theoretically learnable 
from polynomially many classical examples. In contrast to this information-theoretic equivalence 
betwen quantum and classical learnability, though, we observe that a separation does exist 
between efficient quantum and classical learnability. For both the model of exact learning 
from membership queries and the PAC model, we show that under a widely held computational 
hardness assumption for classical computation (the intractability of factoring) , there is a concept 
class which is polynomial-time learnable in the quantum version but not in the classical version 
of the model. 
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1 Introduction 



1.1 Motivation 

In recent years many researchers have investigated the power of quantum computers which can 
query a black-box oracle for an unknown function (3], [5], ^, [l(| [l3|, [l5|, 17 , [l^, [2(], [27], ^] . The 



broad goal of research in this area is to understand the relationship betwen the number of quantum 
versus classical oracle queries which are required to answer various questions about the function 



computed by the oracle. For example, a well-known result due to Deutsch and Jozsa [15] shows 
that exponentially fewer queries are required in the quantum model in order to determine with 
certainty whether a black-box oracle computes a constant Boolean function or a function which is 
balanced between outputs and 1. More recently, several researchers have studied the number of 
quantum oracle queries which are required to determine whether or not the function computed by 



a black-box oracle ever assumes a nonzero value |, |, |, |J, H, ||. 

A natural question which arises within this framework is the following: what is the relationship 
between the number of quantum versus classical oracle queries which are required in order to exactly 
identify the function computed by a black-box oracle? Here the goal is not to determine whether 
a black-box function satisfies some particular property (such as ever taking a nonzero value), but 
rather to precisely identify a black-box function which belongs to some restricted class of possible 
functions. The classical version of this problem has been well studied in the computational learning 
theory literature ||, |n|, [19|, ^l], 22], and is known as the problem of exact learning from membership 
queries. The question stated above can thus be phrased as follows: what is the relationship between 
the number of quantum versus classical membership queries which are required for exact learning? 
We answer this question in this paper. 

In addition to the model of exact learning from membership queries, we also consider a quantum 
version of Valiant's widely studied PAC learning model which was introduced by Bshouty and 



Jackson [12]. While a learning algorithm in the classical PAC model has access to labeled examples 
which are drawn from a fixed probability distribution, a learning algorithm in the quantum PAC 
model has access to a fixed quantum superposition of labeled examples. Bshouty and Jackson gave 
a polynomial-time algorithm for a particular learning problem in the quantum PAC model, but 
did not address the general relationship between the number of quantum versus classical examples 
which are required for PAC learning. We answer this question as well. 



1.2 The results 

We show that in an information-theoretic sense, quantum and classical learning are equivalent up 
to polynomial factors: for both the model of exact learning from membership queries and the PAC 
model, there is no learning problem which can be solved using significantly fewer quantum examples 
than classical examples. More precisely, our first main theorem is the following: 

Theorem 1 Let C be any concept class. Then C is exact learnable from a polynomial number 
of quantum membership queries if and only if C is exact learnable from a polynomial number of 
classical membership queries. 

Our second main theorem is an analogous result for quantum versus classical PAC learnability: 

Theorem 2 Let C be any concept class. Then C is PAC learnable from a polynomial number of 
quantum examples if and only ifC is PAC learnable from a polynomial number of classical examples. 
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The proofs of Theorems [T] and |^ use several different quantum lower bound techniques and demon- 
strate an interesting relationship between lower bound techniques in quantum computation and 
computational learning theory. 

Theorems |] and § are information-theoretic rather than computational in nature; they show 
that for any learning problem in these two models, if there is a quantum learning algorithm which 
uses polynomially many examples, then there must also exist a classical learning algorithm which 
uses polynomially many examples. However, Theorems [l] and |2| do not imply that every polynomial 
time quantum learning algorithm must have a polynomial time classical analogue. In fact, using 
known computational hardness results for classical polynomial-time learning algorithms, we show 
that the equivalences stated in Theorems |l| and ^ do not hold for efficient learnability. Under 
a widely accepted computational hardness assumption for classical computation, the hardness of 
factoring Blum integers, we observe that Shor's polynomial-time factoring algorithm implies that 
for each of the two learning models considered in this paper, there is a concept class which is 
polynomial-time learnable in the quantum version but not in the classical version of the model. 



1.3 Organization 

In Section |2| we define the classical exact learning model and the classical PAC learning model and 
describe the quantum computation framework. In Section |3| we prove the information-theoretic 
equivalence of quantum and classical exact learning from membership queries (Theorem [l]) , and in 
Section |] we prove the information-theoretic equivalence of quantum and classical PAC learning 
(Theorem |2|) . Finally, in Section || we observe that under a widely accepted computational hardness 
assumption for classical computation, in each of these two learning models there is a concept class 
which is quantum learnable in polynomial time but not classically learnable in polynomial time. 



2 Preliminaries 

A concept c over {0, l} n is a Boolean function over the domain {0, l} n , or equivalently a concept 
can be viewed as a subset {x G {0, l} n : c(x) = 1} of {0, l} n . A concept class C = U n >iC„ is a 
collection of concepts, where C n = {c G C : c is a concept over {0, l} n }. For example, C n might 
be the family of all Boolean formulae over n variables which are of size at most n 2 . We say that a 
pair (x, c(x)) is a labeled example of the concept c. 

While many different learning models have been proposed, most models adhere to the same 
basic paradigm: a learning algorithm for a concept class C typically has access to (some kind of) an 
oracle which provides examples that are labeled according to a fixed but unknown target concept 
c G C, and the goal of the learning algorithm is to infer (in some sense) the structure of the target 
concept c. The two learning models which we discuss in this paper, the model of exact learning 
from membership queries and the PAC model, make this rough notion precise in different ways. 



2.1 Classical Exact Learning from Membership Queries 

The model of exact learning from membership queries was introduced by Angluin and has since 
been widely studied O, [19|, 21, p2^ ]. In this model the learning algorithm has access to a 
membership oracle MQ C where c G C n is the unknown target concept. When given an input string 
x G {0,1}™, in one time step the oracle MQ C returns the bit c(x); such an invocation is known 
as a membership query since the oracle's answer tells whether or not x G c (viewing c as a subset 
of {0, l} n ). The goal of the learning algorithm is to construct a hypothesis h : {0, l} n —* {0, 1} 
which is logically equivalent to c, i.e. h(x) = c(x) for all x G {0, l} n . Formally, we say that an 
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algorithm A (a probabilistic Turing machine) is an exact learning algorithm for C using membership 
queries if for all n > 1, for all c £ C n , if A is given n and access to MQ C , then with probability at 
least 2/3 algorithm A outputs a representation of a Boolean circuit h such that /i(x) = c(x) for all 
x € {0, 1}™. The sample complexity T(n) of a learning algorithm A for C is the maximum number 
of calls to MQ C which A ever makes for any c € C n . We say that C is exact learnable if there is 
a learning algorithm for C which has poly(n) sample complexity, and we say that C is efficiently 
exact learnable if there is a learning algorithm for C which runs in poly(n) time. 



2.2 Classical PAC Learning 

The PAC (Probably Approximately Correct) model of concept learning was introduced by Valiant 
in (2^] and has since been extensively studied ||, 24]. In this model the learning algorithm has 
access to an example oracle EX(c,T>) where c € C n is the unknown target concept and T> is an 
unknown distribution over {0, l} n . The oracle EX(c, T>) takes no inputs; when invoked, in one time 
step it returns a labeled example (x,c(x)) where x G {0, l} n is randomly selected according to the 
distribution T>. The goal of the learning algorithm is to generate a hypothesis h : {0, l} n — > {0, 1} 
which is an e- approximator for c under T>, i.e. a hypothesis h such that Pi x£ x>[h(x) ^ c(x)] < e. 
An algorithm A (again a probabilistic Turing machine) is a PAC learning algorithm for C if the 
following condition holds: for all n > 1 and < e, 5 < 1, for all c £ C n , for all distributions T> over 
{0, l} n , if A is given n, e, 5 and access to EX(c, P), then with probability at least 1 — 5 algorithm 
A outputs a representation of a circuit h which is an e-approximator for c under T>. The sample 
complexity T(n, e, 5) of a learning algorithm A for C is the maximum number of calls to EX(c, T>) 
which A ever makes for any concept c £ C n and any distribution T> over {0, l} n . We say that C is 
PAC learnable if there is a PAC learning algorithm for C which has poly(n, -, 4) sample complexity, 
and we say that C is efficiently PA C learnable if there is a PAC learning algorithm for C which runs 
in poly(n, \, ±) time. 



2.3 Quantum Computation 



Detailed descriptions of the quantum computation model can be found in 14, here we outline 
only the basics using the terminology of quantum networks as presented in P] . A quantum network 
M is a quantum circuit (over some standard basis augmented with one oracle gate) which acts on 
an m-bit quantum register; the computational basis states of this register are the 2 m binary strings 
of length to. A quantum network can be viewed as a sequence of unitary transformations 

XJ Q ,O l ,U 1 ,0 2 ,...,U T - U T ,U T , 

where each Ui is an arbitrary unitary transformation on to qubits and each Oi is a unitary trans- 
formation which corresponds to an oracle call[] Such a network is said to have query complex- 
ity T. At every stage in the execution of the network, the current state of the register can be 
represented as a superposition S 2 e{o,i} m a z\ z ) where the a z are complex numbers which satisfy 
£ze{o,i} m ll a zl| 2 = 1- If this state is measured, then with probability ||a^|| 2 the string z € {0, l} m 
is observed and the state collapses down to \z). After the final transformation Ut takes place, a 
measurement is performed on some subset of the bits in the register and the observed value (a 
classical bit string) is the output of the computation. 

Several points deserve mention here. First, since the information which our quantum network 
uses for its computation comes from the oracle calls, we may stipulate that the initial state of 

1 Since there is only one kind of oracle gate, each Oi is the same transformation. 
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the quantum register is always |O m ). Second, as described above each U{ can be an arbitrarily 
complicated unitary transformation (as long as it does not contain any oracle calls) which may 
require a large quantum circuit to implement. This is of small concern to us since we are chiefly 
interested in query complexity and not circuit size. Third, as defined above our quantum networks 
can make only one measurement at the very end of the computation; this is an inessential restriction 
since any algorithm which uses intermediate measurements can be modified to an algorithm which 
makes only one final measurement. Finally, we have not specified just how the oracle calls Oj work; 



we address this point separately in Sections £j] and 41 for each type of oracle. 

If 1^) = Yliz a z\ z ) and \ip) = J2z &z\z) are two superpositions of basis states, then the Euclidean 
distance betweeen \<p) and is \\4>) — = (J2 Z \ a z~ Pzl 2 ) 1 ^ 2 ■ The total variation distance between 
two distributions T>\ and T>2 is defined to be J2 X I^M X ) ~~ T^2{x)\- The following fact (Lemma 3.2.6 
of ||), which relates the Euclidean distance between two superpositions and the total variation 
distance between the distributions induced by measuring the two superpositions, will be useful: 



Fact 3 Let \4>) and |V>) be two unit-length superpositions which represent possible states of a quan- 
tum register. If the Euclidean distance \ \<f>) — \ip) \ is at most e, then performing the same observation 
on \4>) and \ip) induces distributions and T>^ which have total variation distance at most 4e. 



3 Exact Learning from Quantum Membership Queries 
3.1 Quantum Membership Queries 

A quantum membership oracle QMQ C is the natural quantum generalization of a classical mem- 
bership oracle MQ C : on input a superposition of query strings, the oracle QMQ C generates the 
corresponding superposition of example labels. More formally, a QMQ C gate maps the basis state 
\x, b) (where x G {0, l} n and b G {0, 1}) to the state \x, b(Bc(x)). If M is a quantum network which 
has QMQ C gates as its oracle gates, then each Oi is the unitary transformation which maps \x, b, y) 
(where x G {0, l} n , b G {0, 1} and y G {0, 1}™-"- 1 ) to \x, b®c(x), y)J\Om QMQ C oracle is identi- 
cal to the well-studied notion of a quantum black-box oracle for c [§|7|1) H> |§|) H> 0' HI' ^1- 
We discuss the relationship between our work and these results in Section |3.4| . 

A quantum exact learning algorithm for C is a family of quantum networks A/"i , A/2 , • • • , where 
each network M n has a fixed architecture independent of the target concept c G C n , with the 
following property: for all n > 1, for all c G C n , if A/" n 's oracle gates are instantiated as QMQ C 
gates, then with probability at least 2/3 the network Af n outputs a representation of a (classical) 
Boolean circuit h : {0, 1}™ — > {0, 1} such that h(x) = c{x) for all x G {0, 1}". The quantum sample 
complexity of a quantum exact learning algorithm for C is T(n), where T(n) is the query complexity 
of J\f n . We say that C is exact learnable from quantum membership queries if there is a quantum 
exact learning algorithm for C which has poly(n) quantum sample complexity, and we say that C 
is efficiently quantum exact learnable if each network M n is of poly(n) size. 



3.2 Lower Bounds on Classical and Quantum Exact Learning 

Two different lower bounds are known for the number of (classical) membership queries which are 
required to exact learn any concept class. In this section we prove two analogous lower bounds on 
the number of quantum membership queries required to exact learn any concept class. Throughout 
this section for ease of notation we omit the subscript n and write C for C n . 

2 Note that each Oi only affects the first n + 1 bits of a basis state. This is without loss of generality since the 
transformations Uj can "permute bits" of the network. 
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3.2.1 A Lower Bound Based on Similarity of Concepts 

Consider a set of concepts which are all "similar" in the sense that for every input almost all 
concepts in the set agree. Known results in learning theory state that such a concept class must 
require a large number of membership queries for exact learning. More formally, let C" C C be any 
subset of C. For a G {0, l} n and b G {0, 1} let Cj a ^ denote the set of those concepts in C' which 

assign label b to example a, i.e. C'^ a u = {c 6 C" : c(a) = b}. Let 7^ b x = \C'^ a ^ \/\C'\ be the fraction 

of such concepts in C , and let 7^" = min{7^' ^ , 7/^ > ' 1 \}; thus 7^" is the minimum fraction of concepts 

in C which can be eliminated by querying MQ C on the string a. Let j G = max{7^" : a E {0, 1}™}. 
Finally, let ^ c be the minimum of 7^ across all C'CC such that \C'\ > 2. Thus 

*C ■ \ C '(a,b)\ 

7 = mm max mm — — — — . 

C'CC,\C'\>2 aG{0,l} n be{o,i} \C'\ 

Intuitively, the inner min corresponds to the fact that the oracle may provide a worst-case response 
to any query; the max corresponds to the fact that the learning algorithm gets to choose the "best" 
query point a; and the outer min corresponds to the fact that the learner must succeed no matter 
what subset C of C the target concept is drawn from. Thus ^ c is small if there is a large set C 
of concepts which are all very similar in that any query eliminates only a few concepts from C . If 
this is the case then many membership queries should be required to learn C; formally, we have 
the following lemma which is a variant of Fact 2 from |Ti| (the proof is given in Appendix [A|) : 

Lemma 4 Any (classical) exact learning algorithm for C must have sample complexity Q(J^). 

We now develop some tools which will enable us to prove a quantum version of Lemma |j. Let 
C C C, \C'\ > 2 be such that 7^ = 7 . Let c\, . . . , c\qi\ be a listing of the concepts in C. Let the 
typical concept for C' be the function c : {0, l} n — > {0, 1} defined as follows: for all a G {0, l} n , 
c(a) is the bit b such that |C/ ab J > \C'\/2 (ties are broken arbitrarily; note that a tie occurs only 

if 7^ = 1/2). The typical concept c need not belong to C or even to C. Let the difference matrix 
D be the \C'\ x 2 n zero/one matrix where rows are indexed by concepts in C", columns are indexed 
by strings in {0, l} n , and Di )X = 1 iff Cj(x) 7^ c(x). By our choice of C and the definition of 7 , 
each column of D has at most \C'\ ■ 7^ ones, i.e. the L\ matrix norm of D is \\D ||i < \C'\ ■ 7 . 

Our quantum lower bound proof uses ideas which were first introduced by Bennett et al. || . Let 
M be a fixed quantum network architecture and let Uq, Oi, . . . , Ut-i, Ot, Ut be the corresponding 
sequence of transformations. For 1 < t < T let \4>1) be the state of the quantum register after the 
transformations up through Ut~i have been performed (we refer to this stage of the computation 
as time t) if the oracle gate is QMQ C . As in 0, for x G {0, l} n let q x (\(j)f)), the query magnitude 
of string x at time t with respect to c, be the sum of the squared magnitudes in |</>£) of the basis 
states which are querying QMQ C on string x at time i; so if |</>£) = J2ze{o,i} m a z\z), then 

<ixM)) = E KJ 2 - 

ioe{o,i} m - n 

The quantity q x (\cp^)) can be viewed as the amount of amplitude which the network Af invests 
in the query string x to QMQ C at time t. Intuitively, the final outcome of A^'s computation cannot 
depend very much on the oracle's responses to queries which have little amplitude invested in them. 
Bennett et al. formalized this intuition in the following theorem (||, Theorem 3.3): 
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Theorem 5 Let \<ftj) be defined as above. Let F C {0, . . . ,T — 1} x {0, l} n be a set of time-string 

2 

pairs such that Yl,(t,x)eF Qx(\</>t)) — t- Now suppose the answer to each query instance (t,x) G F 
is modified to some arbitrary fixed bit at x (these answers need not be consistent with any oracle). 
Let \(pf) be the state of the quantum register at time t if the oracle responses are modified as stated 
above. Then \ — \4>t)\ — e - 

The following lemma, which is a generalization of Corollary 3.4 from ||, shows that no quantum 
learning algorithm which makes few QMQ queries can effectively distinguish many concepts in C' 
from the typical concept c. 

Lemma 6 Fix any quantum network architecture M which has query complexity T. For all e > 
there is a set S C C' of cardinality at most T 2 \C'\ A / C /e 2 such that for all c G C \ S, we have 
||#)-|#)|<e. 

Proof: Since \\4>t)\ = 1 f° r an t = 0, 1,...,T — 1, we have Y<t=o X^e{o,i} n Qx{\4>t)) = ^ Let 
q(\4>t)) € K 2 ™ be the 2 n -dimensional vector which has entries indexed by strings x G {0, l} n and 
which has qxQ^t)) as its x-th entry. Note that the L\ norm ||9(|0t))||i is 1 for all t = 0, . . . , T — 1. 
For any a G C let <? Cl (M)) be defined as T l x-.c i {x)^x) fe(l0t))- The quantity q c ,(\4>t)) can be 
viewed as the total query magnitude with respect to c at time t of those strings which distinguish 
Cj from c. Note that Dq(\<pD) G W c ' is an \C'\ -dimensional vector whose i-th element is precisely 
EjE:ci(*)#a(*) ?*(|^|» = ?ci(|^»- Since \\D\\i < \C'\ ■ *p and ||g(|#))||i = 1, by the basic property 
of matrix norms we have that [|.D(/(|<^))||i < \C'\ ■ 7 , i.e. J2 Ci £C' Qci(\<Pt)) — ' ^ C ■ Hence 



r-i 



E E QcM))<T\C'\ 
t=o Ci eC> 
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If we let S = {a G C : E^Tq 1 ^(l^f)) > by Markov's inequality we have \S\ < T 2 \C'\j c /e 2 . 
Finally, if c ^ S then ^"o 1 9c(|0|» < y- Theorem | then implies that ||#) - |#)| < e. ■ 
Now we can prove our quantum version of Lemma ||. 

( S 1 \ x l 2 

Theorem 7 Any quantum exact learning algorithm for C must have sample complexity Q,\ 

Proof: Suppose that N is a quantum exact learning algorithm for C which makes at most T = 

/ \l/2 

64 ' ( ^ ) quantum membership queries. If we take e = then Lemma |6] implies that there is 

\c'\ " 1 

a set S C C of cardinality at most ■'-g-'- such that for all c G C \ S we have ||<^p) — |^)| < 33- Let 

ci , C2 be any two concepts in C \ S. By Fact ||, the probability that J\f outputs a circuit equivalent 

to ci can differ by at most g if J\fs oracle gates are QMQc as opposed to QM Q Cl , and likewise 

for QMQz versus QMQ C2 . It follows that the probability that Af outputs a circuit equivalent to c\ 

can differ by at most \ if JV's oracle gates are QMQ Cl as opposed to QM Q C2 , but this contradicts 

the assumption that Af is a quantum exact learning algorithm for C. ■ 



3.2.2 A Lower Bound Based on Concept Class Size 

A second reason why a concept class can require many membership queries is its size. Angluin Q 
has given the following lower bound, incomparable to the bound of Lemma on the number of 
membership queries required for classical exact learning (the proof is given in Appendix |A|) : 
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Lemma 8 Any (classical) exact learning algorithm for C must have sample complexity 0(log|C|). 

In this section we prove a variant of this lemma for the quantum model. Our proof uses 
ideas from Q so we introduce some of their notation. Let N = 2 n . For each concept c G C, let 
X c = (Xq, . . . ,X'^_ l ) G {0, 1}^ be a vector which represents c as an iV-tuple, i.e. Xf = c{x l ) 
where x l G {0, l} n is the binary representation of i. From this perspective we may identify C with 
a subset of {0, 1}^, and we may view a QMQ C gate as a black-box oracle for X c which maps basis 
state \x\b,y) to \x\ b®Xf , y). 

Using ideas from [17, |l^], Beals et al. have proved the following useful lemma, which relates the 
query complexity of a quantum network to the degree of a certain polynomial (Q, Lemma 4.2): 

Lemma 9 LetM be a quantum network that makes T queries to a black-box X, and let B C {0, l} m 
be a set of basis states. Then there exists a real-valued multilinear polynomial Pb(X) of degree at 
most 2T which equals the probability that observing the final state of the network with black-box X 
yields a state from B. 

We use Lemma |9| to prove the following quantum lower bound based on concept class size: 
Theorem 10 Any exact quantum learning algorithm for C must have sample complexity 0( lo ^J 



Proof: Let TV be a quantum network which learns C and has query complexity T. For all c G C 
we have the following: if A/'s oracle gates are QMQ C gates, then with probability at least 2/3 the 
output of M is a representation of a Boolean circuit h which computes c. Let c\, . . . , c\c\ be all of the 
concepts in C, and let X 1 , . . . , X^ be the corresponding vectors in {0, 1}^. For all £ = 1, ... , \C\ 
let Bi C {0, l} m be the collection of those basis states which are such that if the final observation 
performed by Af yields a state from Bi, then the output of A/" is a representation of a Boolean 
circuit which computes q. Clearly for i ^ j the sets Bi and Bj are disjoint. By Lemma |9|, for each 
i = 1, . . . ,\C\ there is a real- valued multilinear polynomial Pi of degree at most 2T such that for 
all j = 1, . . . , |C|, the value of P^X 1 ) is precisely the probability that the final observation on M 
yields a representation of a circuit which computes q, provided that the oracle gates are QMQ C . 
gates. The polynomials Pi thus have the following properties: 

1. Pi{X' 1 ) > 2/3 for all i = 1, . . . , |C|; 

2. For any j = 1,...,[C|, we have J2i^j Pi(X 3 ) < 1/3 (since the total probability across all 
possible observations is 1). 

Let N = J2i=o (?)■ For any X = (X , . . . , JTjv-i) G {0, 1}^ let X G {0, 1}^° be the column 
vector which has a coordinate for each monic multilinear monomial over Xq, . . . ,X/v-i of degree 
at most 2T. Thus, for example, if N = 4 and 2T = 2 we have X = (Xq, X\, X2, X3) and 

X 1 = (1,Xo,Xi,X2,X3,XoXi,XoX2,XoX3,XiX2,XiX3,X2X3). 

If V is a column vector in $t N °, then V t X corresponds to the degree-2T polynomial whose coefficients 
are given by the entries of V. For i = 1, . . . , |C| let Vi G $l N ° be the column vector which corresponds 
to the coefficients of the polynomial Pi. Let M be the |C| x A^o matrix whose i-th. row is Vf; note 
that multiplication by M defines a linear transformation from 5?^° to Since V^X^ is precisely 
Pi(X^), the product MP is a column vector in W G \ which has Pi(X^) as its i-th coordinate. 

Now let L be the \C\ x \C\ matrix whose j-th column is the vector MX 3 . A square matrix A is 
said to be diagonally dominant if \au\ > 2~2j^i \ a ij \ f° r an Properties (1) and (2) above imply that 
the transpose of L is diagonally dominant. It is well known that any diagonally dominant matrix 
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must be of full rank (a proof is given in Appendix ^|). Since L is full rank and each column of L 
is in the image of M, it follows that the image under M of is all of »l c l, and hence iV"o > \C\. 
Finally, since Nq = X)i=o CT) ^ N 2T , we have T > = lo |^ , which proves the theorem. ■ 



The lower bound of Theorem 1C is nearly tight as witnessed by the following example: let C 
be the collection of all 2 n parity functions over {0, 1}™, so each function in C is defined by a string 
a € {0, l} n and c a (x) = a ■ x. The quantum algorithm which solves the well-known Deutsch-Jozsa 
problem jtl| can be used to exactly identify a and thus learn the target concept with probability 



1 from a single query. It follows that the factor of n in the denominator of Theorem 1C cannot be 
replaced by any function g(n) = o{n). 

3.3 Quantum and Classical Exact Learning are Equivalent 

We have seen two different reasons why exact learning a concept class can require a large number 
of (classical) membership queries: the class may contain many similar concepts (i.e. ^ c is small), 
or the class may contain very many concepts (i.e. log \C\ is large). The following lemma, which is 



a variant of Theorem 3.1 from [21], shows that these are the only reasons why many membership 



queries may be required (the proof is given in Appendix |A|). 

Lemma 11 There is an exact learning algorithm for C which has sample complexity 0((log |C|)/7 ) 

Using this upper bound we can prove that up to polynomial factors, quantum exact learning is 
no more powerful than classical exact learning. 

Theorem 12 Let C be any concept class. IfC is exact learnable from quantum membership queries, 
then C is exact learnable from classical membership queries. 

Proof: Suppose that C is not exact learnable from classical membership queries, i.e. for any 
polynomial p there are infinitely many values of n such that any learning algorithm for C n requires 



more than p(n) queries in the worst case. By Lemma 11, this means that for any polynomial p 
there are infinitely many values of n such that (log \ C n \)/ A f Cn > p(n). At least one of the following 
conditions must hold: (1) for any polynomial p there are infinitely many values of n such that 
p(n) < l/7 Cn ; or (2) for any polynomial p there are infinitely many values of n such that p(n) < 
log \C n \. Theorems and [l0| show that in either case C cannot be exact learnable from a polynomial 
number of quantum membership queries. ■ 

In the opposite direction, it is easy to see that a QMQ C oracle can be used to simulate the 
corresponding MQ C oracle, so any concept class which is exact learnable from classical membership 
queries is also exact learnable from quantum membership queries. This proves Theorem [|. 

3.4 Discussion 



Theorem 12 provides an interesting contrast to several known results for black-box quantum com- 
putation. Let F denote the set of all 2 2 " functions from {0, l} n to {0, 1}. Beals et al. H have 
shown that if / : F — > {0, 1} is any total function (i.e. /(c) is defined for every possible concept c 
over {0, l} n ), then the query complexity of any quantum network which computes / is polynomially 
related to the number of classical black-box queries required to compute /. This result is interesting 
because it is well known || 10, 15|, 27] that for certain concept classes C C F and partial functions 
/ : C — > {0, 1}, the quantum black-box query complexity of / can be exponentially smaller than 
the classical black-box query complexity. 
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Our Theorem 12 provides a sort of dual to the results of Beals et al.: their bound on query 
complexity holds only for the fixed concept class F but for any function / : F — > {0, 1}, while 
our bound holds for any concept class C C F but only for the fixed problem of exact learning. In 
general, the problem of computing a function / : C — > {0, 1} from black-box queries can be viewed 
as an "easier" version of the corresponding exact learning problem: instead of having to figure out 
only one bit of information about the unknown concept c (the value of /), in the learning framework 
the algorithm must identify c exactly. Theorem [l2| shows that for this more demanding problem, 
unlike the results in |], |l(], 15, 27] there is no "clever" way of restricting the concept class C so 
that learning becomes substantially easier in the quantum setting than in the classical setting. 



4 PAC Learning from a Quantum Example Oracle 

4.1 The Quantum Example Oracle 

Bshouty and Jackson |l2|] have introduced a natural quantum generalization of the standard PAC- 
model example oracle. While a standard PAC example oracle EX(c, T>) generates each example 
(x,c(x)) with probability T>(x), where D is a distribution over {0, l} n , a quantum PAC example 
oracle QEX(c, V) generates a superposition of all labeled examples, where each labeled example 
(x, c(x)) appears in the superposition with amplitude proportional to the square root of V(x). More 
formally, &QEX(c,V) gate maps the initial basis state |0 n ,0) to the state 2~Zze{o,i} n v^i^)\ x ■, c i x )) ■ 
(We leave the action of a QEX(c, T>) gate undefined on other basis states, and stipulate that any 
quantum network which includes T QEX(c,T>) gates must have all T gates at the "bottom of the 
circuit," i.e. no gate may occur on any wire between the inputs and any QEX(c,V) gate.) A 
quantum network with T QEX(c,T>) gates is said to be a QEX network with query complexity T. 

A quantum PAC learning algorithm for C is a family {Aftn^s) '■ n > 1, < e, <5 < 1} of 
QEX networks with the following property: for all n > 1 and < e, 8 < 1, for all c £ C n , for all 
distributions T> over {0, l} n , if the network Nr n ,e,5) nas au its oracle gates instantiated as QEX(c, V) 
gates, then with probability at least 1 — 5 the network J\f( nj6j s) outputs a representation of a circuit 
h which is an e- approximator to c under T>. The quantum sample complexity T(n, e, 5) of a quantum 
PAC algorithm is the query complexity of Af( nie; s) ■ A concept class C is quantum PAC learnable if 
there is a quantum PAC learning algorithm for C which has poly(n, -, j) sample complexity, and 
we say that C is efficiently quantum PAC learnable if each network Nt n ^s) is of size poly(n, \,\)- 

4.2 Lower Bounds on Classical and Quantum PAC Learning 

Throughout this section for ease of notation we omit the subscript n and write C for C n . We view 
each concept c G C as a subset of {0, l} n . For S C {0, l} n , we write Hc(S) to denote {cC\S : c € C}, 
so |ITc (aS) I is the number of different "dichotomies" which the concepts in C induce on the points 
in S. A subset S C {0, l} n is said to be shattered by C if |IIc(S')| = 2^1, i.e. if C induces every 
possible dichotomy on the points in S. The Vapnik-Chervonenkis dimension of C, VC-DIM(C), is 
the size of the largest subset S C {0, l} n which is shattered by C. 

Well-known results in computational learning theory show that the Vapnik-Chervonenkis di- 
mension of a concept class C characterizes the number of calls to EX(c,T>) which are information- 
theoretically necessary and sufficient to PAC learn C. For the lower bound, the following theorem 
is (a slight simplification of) a result due to Blumer et al. (0, Theorem 2.1.ii.b); a proof sketch is 
given in Appendix [A]. (A stronger bound was later given by Ehrenfeucht et al. [fHf].) 

Theorem 13 Let C be any concept class and d = VC-DIM(C). Then any (classical) PAC learning 
algorithm for C must have sample complexity Q(d). 
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The following theorem is a quantum analogue of Theorem 13; the proof, which extends the 



techniques used in the proof of Theorem |10| using ideas from error-correcting codes, is given in 
Appendix |B| 

Theorem 14 Let C be any concept class and al = VC-DIM{C). Then any quantum PAC learning 
algorithm for C must have quantum sample complexity ^(^)- 

Since the class of parity functions over {0, l} n has Vapnik-Chervonenkis dimension n, as in 
Section [3.2.2 the factor of n in the denominator of Theorem |l4] cannot be replaced by any function 



g{n) = o(n). 

4.3 Quantum and Classical PAC Learning are Equivalent 

A well-known theorem due to Blumer et al. (Theorem 3.2.1.ii.a of @) shows that the VC-dimension 
of a concept class bounds the number of EX(c,T>) calls required for (classical) PAC learning: 

Theorem 15 Let C be any concept class and d = VC-DIM(C). There is a (classical) PAC learning 
algorithm for C which has sample complexity 0(| log ^ + 7 log -)• 



The proof of Theorem 15 is quite complex so we do not attempt to sketch it. As in Section 3.3 



this upper bound along with our lower bound from Theorem 14 together yield: 



Theorem 16 Let C be any concept class. If C is quantum PAC learnable, then C is (classically) 
PAC learnable. 

A QEX(c, T>) oracle can be used to simulate the corresponding EX(c, T>) oracle by immediately 
performing an observation on the QEX gate's outputs; such an observation yields each example 
(x, c(x)) with probability D(x).| Consequently any concept class which is classically PAC learnable 
is also quantum PAC learnable, and Theorem || is proved. 

5 Quantum versus Classical Efficient Learnability 

We have shown that from an information-theoretic perspective, quantum learning is no more pow- 
erful than classical learning (up to polynomial factors). However, we now observe that the apparant 
computational advantages of the quantum model yield efficient quantum learning algorithms which 
are believed to have no efficient classical counterparts. 

A Blum integer is an integer N = pq where p 7^ q are £-bit primes each congruent to 3 modulo 
4. It is widely believed that there is no polynomial-time classical algorithm which can successfully 
factor a randomly selected Blum integer with nonnegligible success probability. 

Kearns and Valiant |23| have constructed a concept class C with the following property: a 
polynomial-time (classical) PAC learning algorithm for C would yield a polynomial-time algorithm 
for factoring Blum integers. Thus, assuming that factoring Blum integers is a computationally 
hard problem for classical computation, the Kearns- Valiant concept class C is not efficiently PAC 



learnable. On the other hand, in a celebrated result Shor [26 has exhibited a poly(n) size quantum 
network which can factor an arbitrary n-bit integer with high success probability. His construction 
yields an efficient quantum PAC learning algorithm for the Kearns- Valiant concept class. We thus 
have 



As noted in Section |2.3[ intermediate observations during a computation can always be simulated by a single 
observation at the end of the computation. 
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Observation 17 // there is no polynomial-time classical algorithm for factoring Blum integers, 
then there is a concept class C which is efficiently quantum PAC learnable but not efficiently clas- 
sically PAC learnable. 

The hardness results of Kearns and Valiant were later extended by Angluin and Kharitonov 
||]. Using a public- key encryption system which is secure against chosen-cyphertext attack (based 
on the assumption that factoring Blum integers is computationally hard for polynomial-time al- 
gorithms), they constructed a concept class C which cannot be learned by any polynomial-time 
learning algorithm which makes membership queries. As with the Kearns- Valiant concept class, 
though, using Shor's quantum factoring algorithm it is possible to construct an efficient quantum 
exact learning algorithm for this concept class. Thus, for the exact learning model as well, we have: 



Observation 18 If there is no polynomial-time classical algorithm for factoring Blum integers, 
then there is a concept class C which is efficiently quantum exact learnable from membership queries 
but not efficiently classically exact learnable from membership queries. 



6 Conclusion and Future Directions 

While we have shown that quantum and classical learning are (up to polynomial factors) information- 
theoretically equivalent, many interesting questions remain about the relationship between efficient 



quantum and classical learnability. One goal is to prove analogues of Observations 17 and 18 under 
a weaker computational hardness assumption such as the existence of any one-way function; it 
seems plausible that some some combination of cryptographic techniques together with the ideas 



used in Simon's quantum algorithm 27] might be able to achieve this. Another goal is to develop 
efficient quantum learning algorithms for natural concept classes, such as the polynomial-time 
quantum algorithm of Bshouty and Jackson (T^| for learning DNF formulae from uniform quantum 
examples. 
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A Bounds on Classical Sample Complexity 

Proof of Lemma ^: Let C C C, \C'\ > 2 be such that j c> = j c . Consider the following adversarial 
strategy for answering queries: given the query string a, answer the bit b which maximizes lf a by 

This strategy ensures that each response eliminates at most a 7^ < 7^ = j c fraction of the 
concepts in C . After ~ 1 membership queries, fewer than half of the concepts in C have been 
eliminated, so at least two concepts have not yet been eliminated. Consequently, it is impossible 
for A to output a hypothesis which is equivalent to the correct concept with probability greater 
than 1/2. (Lemma |) ■ 

Proof of Lemma |^: Consider the following adversarial strategy for answering queries: if C C C 
is the set of concepts which have not yet been eliminated by previous responses to queries, then 
given the query string a, answer the bit b such that 7^ 6 \ > 5- Under this strategy, after log |C| — 1 
membership queries at least two possible target concepts will remain. (Lemma ||) ■ 



Proof of Lemma 11: Consider the following (classical) learning algorithm A: at each stage in 
its execution, if C is the set of concepts in C which have not yet been eliminated by previous 
responses to queries, algorithm ^4's next query string is the string a € {0, 1}™ which maximizes 7^ . 
By following this strategy, each query response received from the oracle must eliminates at least 
a 7^ fraction of the set C, so with each query the size of the set of possible target concepts is 
multiplied by a factor which is at most 1—7^ < 1— 7 . Consequently, after 0((log \C\)/j ) queries, 
only a single concept will not have been eliminated; this concept must be the target concept, so A 



can output a hypothesis h which is equivalent to c. (Lemma |ll| 



Proof Sketch for Theorem 13: The idea behind Theorem 13 is to consider the distribution T> 
which is uniform over some shattered set S of size d and assigns zero weight to points outside of 
S. Any learning algorithm which makes only d/2 calls to EX(c,V) will have no information about 
the value of c on at least d/2 points in S; moreover, since the set S is shattered by C, any labeling 
is possible for these unseen points. Since the error of any hypothesis h under T> is the fraction 
of points in S where h and the target concept disagree, a simple analysis shows that no learning 
algorithm which perform only d/2 calls to EX(c,T>) can have high probability (e.g. 1 — 5 = 2/3) 



of generating a low-error hypothesis (e.g. e = 1/10). (Theorem |13| 

B Proof of Theorem HI 



Let S = {x 1 , . . . , x d } be a set which is shattered by C and let T> be the distribution which is uniform 
on S and assigns zero weight to points outside S. If h : {0, l} n — > {0, 1} is a Boolean function on 
{0, l} n , we say that the relative distance of h and c on S is the fraction of points in S on which 



h and c disagree. We will prove the following result which is stronger than Theorem 14: Let 
be a quantum network with QMQ gates such that for all c E C, if A/'s oracle gates are QM Q c 
gates, then with probability at least 2/3 the output of M is a hypothesis h such that the relative 
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distance of h and c on S is at most 1/10. We will show that such a network M must have query 
complexity at least Since any QEX network with query complexity T can be simulated by a 



QMQ network with query complexity T, taking e = 1/10 and 5=1/3 will prove Theorem 14 



The argument is a modification of the proof of Theorem 1C . Let AT be a quantum network with 
query complexity T which satisfies the following condition: for all c G C, if Af's oracle gates are 
QMQ C gates, then with probability at least 2/3 the output of M is a representation of a Boolean 
circuit h such that the relative distance of h and c on S is at most 1/10. By the well-known Gilbert- 
Varshamov bound from coding theory (see, e.g., Theorem 5.1.7 of |2J|), there exists a set s 1 , . . . , s A 
of cZ-bit strings such that for all i ^ j the strings s l and s J differ in at least d/4 bit positions, where 

A > > * > 2 < 1 -"( 1 /4)) > 2 d ^. 

2^i=o \i> 2^i=o \i) 

(Here H{p) = —p\ogp — (1 — p) log(l —p) is the binary entropy function.) For each i = 1, . . . , A let 
Cj G C be a concept such that the c?-bit string Ci(x 1 ) ■ • • Ci(x d ) is s l (such a concept Cj must exist 
since the set S is shattered by C). 

For i = 1, . . . , A let Bi C {0, l} m be the collection of those basis states which are such that if 
the final observation performed by M yields a state from £>j, then the output of M is a hypothesis 
h such that h and q have relative distance at most 1/10 on S. Since each pair of concepts (H,Cj has 
relative distance at least 1/4 on S, the sets Bi and Bj are disjoint for all i ^ j. 



As in Section |J let N = 2 n and let Xi = (X 3 , . . . ,X 3 N _ X ) £ {0,l} n where X^ is the N- 
tuple representation of the concept Cj. By Lemma p|, for each i = 1, . . . , A there is a real- valued 
multilinear polynomial Pi of degree at most 2T such that for all j = 1, . . . , A, the value of Pi{X 3 ) 
is precisely the probability that the final observation on TV yields a state from Bi provided that 
the oracle gates are QMQ Cj gates. Since, by assumption, if q is the target concept then with 
probability at least 2/3 M generates a hypothesis which has relative distance at most 1/10 from q 
on S, the polynomials Pi have the following properties: 

1. Pi(JP) > 2/3 for all i = 1, ... ,A; 

2. For any j = 1, ... ,A we have that 2~2i^tj Pi(X J ) < 1/3 (since the B^s are disjoint and the 
total probability across all observations is 1). 

Let iVo and X be defined as in the proof of Theorem [l(]. For i = 1, . . . , A let V* G 3^° be 
the column vector which corresponds to the coefficients of the polynomial Pi, so V^X = Pi(X). 
Let M be the A x A^o matrix whose i-th row is the vector V*, so multiplication by M is a linear 
transformation from to The product MP is a column vector in $t A which has Pi{X) as 
its i-th coordinate. 

Now let L be the A x A matrix whose j-th column is the vector MP . As in Theorem |H] we 
have that the transpose of L is diagonally dominant, so L is of full rank and hence Nq > A. Since 
A > 2 d / Q we thus have that T > 2 ^ N = and the theorem is proved. (Theorem 14) ■ 



C A diagonally dominant matrix has full rank 

This fact follows from the following theorem (see, e.g., Theorem 6.1.17 of |25| ) . 

Theorem 19 (Gershgorin's Circle Theorem) Let A be a real or complex-valued n x n matrix. 
Let Si be the disk in the complex plane whose center is an and whose radius is ri = 2~2j^i \ a ij\- Then 
every eigenvalue of A lies in the union of the disks Si, ... , S n . 
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Proof: If A is an eigenvalue of A which has corresponding eigenvector x = (x\, . . . , x n ), then since 
Ax = Xx we have 

(A - diijXi = a ij x j for i = 1, . . . , n. 

Without loss of generality we may assume that || 1 1 oo — 1) so \x k \ — 1 for some k and \xj\ ^ 1 for 
j^k. Thus 

|A - a kk \ = |(A - a kk )x k \ < ^ \akj\\xj\ < Y l° fc il 

and hence A is in the disk S k . ■ 

For a diagonally dominant matrix the radius ri of each disk S% is less than its distance from 
the origin, which is |<ijj|. Hence cannot be an eigenvalue of a diagonally dominant matrix, so the 
matrix must have full rank. 
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